In this project, I outline a computational/bibliometric approach to classifying epistemic cultures. As science studies scholar Karin Knorr-Cetina (1999) highlights, science is not a uniform community. Instead, science is broken up into various cultures that produce different kinds of knowledge through diverse ideologies and practices. An epistemic culture refers to “those sets of practices, arrangements and mechanisms bound together by necessity, affinity and historical coincidence which, in a given area of professional expertise, make up how we know what we know” (Knorr-Cetina 2007, 363). Knorr-Cetina highlights the ways that nonhuman objects fit into the production of knowledge in these contexts, noting that the triangular relations between humans and their respective scientific objects often change over time based on social, financial, and technological factors.
Annemarie Mol (2002) too finds that biomedical objects can be enacted in diverse ways and, in turn, mean different things for different social actors. For example, she shows that atherosclerosis is not one object, but a multiplicity that produces distinct material-discursive effects depending on how researchers, clinicians, or patients relate to it. In their new book, Rebecca Jordan-Young and Katrina Karkazis (forthcoming) similarly show that testosterone is also a multiplici-T - a hormone that is constructed in distinct, and often conflicting ways, depending on the assumptions that researchers make about it. Building off of this literature in feminist science studies, I am interested in the distinct biomedical cultures that use testosterone and how to classify scientific cultures that use biomarkers like testosterone in their research.
In my work, I examine how researchers use testosterone across scientific cultures. Below, I outline the reiterative process I followed to classify scientific cultures using of Aria & Cuccurullo’s bibliometrix package in R. bibliometrix provides a set of tools for quantitative research in bibliometrics and scientometrics. Generally, my classification process follows four steps:
In the first step, I generated a sample of all the research that used testosterone over the past 40 years. To do this, I used the search terms “TS=(testosterone)” in the Web of Science (WoS) Core Collection database, limiting my search to English articles and reviews from 1980-2016. This search yielded 58,642 results. Obviously, the size of this literature makes it difficult to know where to start exploring how different types of scientists use testosterone. Thus, the next step was to conduct a search of the most prominent keywords in the overall literature, which I found using the bibliometrix package.
To use bibliometrix, first install and load the the package. Next, convert the WoS files into a dataframe.
#install.packages('bibliometrix')
library(bibliometrix)
## To cite bibliometrix in publications, please use:
##
## Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, Journal of Informetrics, 11(4), pp 959-975, Elsevier.
##
##
## http:\\www.bibliometrix.org
##
##
## To start with the shiny web-interface, please digit:
## biblioshiny()
setwd("C:/Users/soren/Google Drive/Biomedical MultipliciTs/1. Evidence Infrastructure/2. Domain Parsing")
tsearch_raw <- readFiles('tsearch_1-58642.txt')
tsearch <- convert2df(tsearch_raw, dbsource = "isi", format = "plaintext")
# (I've hidden the extraction process to make this file more concise.)
Then use the biblioAnalysis function to find the most prominent authors, most highly cited papers, and most commonly occurring keywords in the literature. You can either explore these results in R or send them to a .csv like I did.
tsearch_bib <- biblioAnalysis(tsearch, sep = ";")
tsearch_stats <- summary(object = tsearch_bib, k = 500, pause = FALSE)
# Explore results in R
# tsearch_stats$MostRelKeywords
# Output to .csv
write.table(as.data.frame(tsearch_stats$MostRelKeywords),file="tsearch_keywords.csv", quote=F,sep=",",row.names=F)
# (I've hidden the extraction process to make this file more concise.)
I opted to split the literature in various cultural domains based on prominent topics that surfaced in the top-500 keywords. For the manual coding process, I took multiple passes at defining the cultural domains in testosterone research. Generally, I tried to classify cultures based on keywords that were coupled to a general health and/or behavioral outcome. For the most part, this was easy. Keywords related to cardiovascular disease or endocrine disrupting chemicals, for example, each fell into their own domains. Other domains, however, were not so clear. In some cases, I found significant overlap between two domains, but ultimately ended up separating them. Examples of this included separating prostate, breast and testicular cancer. It should be noted that while the cultural domains are usually distinct, there is certainly overlap between some domains. It is not possible to separate research on cardiovascular disease and metabolic diseases, despite the fact that most consider these health outcomes distinct domains. In the future, I plan to write more about the implications of this overlap on scientific exchanges.
Before moving on, I also want to highlight two theoretical commitments that shaped my results. Following feminist science studies (Fine 2017; Jordan-Young and Karkazis forthcoming; Oudshoorn 1994; Roberts 2007), I opted not to split testosterone research domains into sex/gender-specific terms. This perpetuates a misleading and dangerous form of essentialism that my work is attempting to critique. Second, following actor-network theory (Latour 2005), I opted not to split animal research into a distinct domain. While the appeal of this was tempting (as you can see in some of my preliminary coding), I ultimately found that animal research was used in almost every domain because Tsearchers use animal models to inform their epistemic models.
Now, some results: After the first stage of splitting, I ended up with 20 different domains of research, including topics like cardiovascular disease, polycystic ovary syndrome, and bone research. However, after lumping terms together, it became obvious that some of these domains were too general. For example, the “development” hub encompassed nearly 50 keywords while most other domains had only around 10 terms. On the other hand, the “cancer” hub only had 10 terms, but produced a literature nearly twice the size of any other domain. Thus, after another round of splitting, I ended with 25 domains of research including:
Aging; Bone; Breast Cancer; Cardiovascular Disease; Dermatology; Disorders of Sex Development and Sex Differences; Endocrine Disrupting Chemicals; Fertility, Infertility, Reproduction, and Sterility; Immunology; Non-Testosterone Interventions; Metabolic Diseases; Methods; Muscle; Neurology/Mental Health; Obesity; Polycystic Ovary Syndrome; Prostate Cancer; Puberty; Quantity; Sexual Medicine; Social Neuroendocrinology; Surgical Procedures; Testicular Cancer; Testosterone Therapies; and Transgender Health.
To ensure that I included all the relevant search terms for each domain, I decided to conduct a sensitivity analysis akin to Shwed and Bearman’s (2010) paper on scientific consensus. To do this, I inserted all of the domain-specific keywords from Step 2 into a WoS search for each domain. In turn, I followed the same procedure outlined above to download the WoS data, output the top-100 keywords, and include relevant topics that were not included in my original search terms. For example, here are the search terms and sensitivity analyses I conducted for the Cardiovascular Disease Domain.
To carry out this process in R, upload your WoS .txt files and convert them to dataframes.
setwd("C:/Users/soren/Google Drive/Biomedical MultipliciTs/1. Evidence Infrastructure/3. Reiterative Analysis")
# Importing the 25 Domain Datasets
aging <- readLines('aging_6131.txt')
bone <- readLines('bone_4851.txt')
breast_cancer <- readLines('breast_cancer_1979.txt')
cvd <- readLines('cvd_4175.txt')
derm <- readLines('derm_3252.txt')
dsd <- readLines('dsd_4615.txt')
edcs <- readLines('edcs_3125.txt')
firs <- readLines('firs_5846.txt')
immuno <- readLines('immuno_2631.txt')
interventions <- readLines('interventions_5966.txt')
metabolic <- readLines('metabolic_4912.txt')
methods <- readLines('methods_6261.txt')
muscle <- readLines('muscle_4249.txt')
neuro <- readLines('neuro_5125.txt')
obesity <- readLines('obesity_6489.txt')
pcos <- readLines('pcos_3353.txt')
prostate_cancer <- readLines('prostate_cancer_5438.txt')
puberty <- readLines('puberty_4591.txt')
quantity <- readLines('quantity_6493.txt')
sexmed <- readLines('sexmed_2980.txt')
snet <- readLines('snet_5553.txt')
surgical <- readLines('surgical_6498.txt')
test_cancer <- readLines('testicular_cancer_1345.txt')
testo_therapies <- readLines('testo_therapies_8185.txt')
transhealth <- readLines('transhealth_464.txt')
# Convert to Dataframes
aging <- isi2df(aging)
bone <- isi2df(bone)
breast_cancer <- isi2df(breast_cancer)
cvd <- isi2df(cvd)
derm <- isi2df(derm)
dsd <- isi2df(dsd)
edcs <- isi2df(edcs)
firs <- isi2df(firs)
immuno <- isi2df(immuno)
interventions <- isi2df(interventions)
metabolic <- isi2df(metabolic)
methods <- isi2df(methods)
neuro <- isi2df(neuro)
muscle <- isi2df(muscle)
obesity <- isi2df(obesity)
pcos <- isi2df(pcos)
puberty <- isi2df(puberty)
prostate_cancer <- isi2df(prostate_cancer)
quantity <- isi2df(quantity)
sexmed <- isi2df(sexmed)
snet <- isi2df(snet)
surgical <- isi2df(surgical)
test_cancer <- isi2df(test_cancer)
testo_therapies <- isi2df(testo_therapies)
transhealth <- isi2df(transhealth)
# (I've hidden the extraction process to make this file more concise.)
Then, conduct the bibliometrix analysis for each domain. I’ve provided a sample of the top-25 authors, articles, and keywords for the CVD domain below.
setwd("C:/Users/soren/Google Drive/Biomedical MultipliciTs/1. Evidence Infrastructure/3. Reiterative Analysis")
cvd_bib <- biblioAnalysis(cvd, sep = ";")
cvd_stats <- summary(object = cvd_bib, k = 25, pause = FALSE)
##
##
## Main Information about data
##
## Documents 4175
## Sources (Journals, Books, etc.) 1085
## Keywords Plus (ID) 7359
## Author's Keywords (DE) 4957
## Period 1983 - 2016
## Average citations per documents 35.2
##
## Authors 15515
## Author Appearances 23580
## Authors of single authored documents 107
## Authors of multi authored documents 15408
##
## Documents per Author 0.269
## Authors per Document 3.72
## Co-Authors per Documents 5.65
## Collaboration Index 3.94
##
## Document types
## B 23
## J 4123
## S 29
##
##
## Annual Scientific Production
##
## Year Articles
## 1983 3
## 1985 1
## 1989 3
## 1990 7
## 1991 38
## 1992 41
## 1993 51
## 1994 45
## 1995 49
## 1996 75
## 1997 69
## 1998 79
## 1999 79
## 2000 67
## 2001 108
## 2002 115
## 2003 134
## 2004 119
## 2005 152
## 2006 145
## 2007 209
## 2008 219
## 2009 227
## 2010 247
## 2011 278
## 2012 309
## 2013 300
## 2014 308
## 2015 342
## 2016 356
##
## Annual Percentage Growth Rate 17.90401
##
##
## Most Productive Authors
##
## Authors Articles Authors Articles Fractionalized
## 1 MAGGI M 86 JONES TH 17.81
## 2 CORONA G 71 MAGGI M 11.97
## 3 JONES TH 60 YEAP BB 11.08
## 4 FORTI G 48 BASARIA S 10.87
## 5 MANNUCCI E 45 CORONA G 10.42
## 6 CHANNER KS 44 BJORNTORP P 9.71
## 7 BASARIA S 38 SAAD F 9.32
## 8 BHASIN S 38 CHANNER KS 9.25
## 9 RASTRELLI G 38 GOOREN L 8.45
## 10 HANDELSMAN DJ 35 TRAISH AM 7.60
## 11 YEAP BB 34 BHASIN S 7.34
## 12 WALLASCHOFSKI H 33 SCHOOLING CM 6.74
## 13 SAAD F 32 KHALIL RA 6.42
## 14 HARING R 31 FORTI G 6.30
## 15 VOLZKE H 29 MORLEY JE 6.19
## 16 NAUCK M 28 HANDELSMAN DJ 6.18
## 17 SFORZA A 28 RECKELHOFF JF 6.07
## 18 DOBS AS 24 BARRETT-CONNOR E 5.77
## 19 AVERSA A 22 SVARTBERG J 5.57
## 20 JONES RD 22 MANNUCCI E 5.54
## 21 SVARTBERG J 21 DAVIS SR 5.43
## 22 BJORNTORP P 20 DOBS AS 5.32
## 23 RECKELHOFF JF 20 NIESCHLAG E 5.21
## 24 VIGNOZZI L 20 AVERSA A 5.11
## 25 LENZI A 19 ZITZMANN M 4.94
##
##
## Top manuscripts per citations
##
## Paper TC TCperYear
## 1 HAYES JD, 2005, ANNU REV PHARMACOL 1935 148.8
## 2 HARMAN SM, 2001, J CLIN ENDOCR METAB 1385 81.5
## 3 KNOCHENHAUER ES, 1998, J CLIN ENDOCR METAB 1073 53.6
## 4 FELDMAN HA, 2002, J CLIN ENDOCR METAB 882 55.1
## 5 GRAY A, 1991, J CLIN ENDOCR METAB 740 27.4
## 6 BJORNTORP P, 1991, DIABETES CARE 730 27.0
## 7 BASARIA S, 2010, NEW ENGL J MED 675 84.4
## 8 MENDELSOHN ME, 2005, SCIENCE 616 47.4
## 9 KAUFMAN JM, 2005, ENDOCR REV 613 47.2
## 10 HEIDENREICH A, 2014, EUR UROL 572 143.0
## 11 RECKELHOFF JF, 2001, HYPERTENSION 554 32.6
## 12 ATTARD G, 2008, J CLIN ONCOL 541 54.1
## 13 LAAKSONEN DE, 2004, DIABETES CARE 536 38.3
## 14 TCHERNOF A, 2013, PHYSIOL REV 513 102.6
## 15 APRIDONIDZE T, 2005, J CLIN ENDOCR METAB 505 38.8
## 16 BJORNTORP P, 1996, INT J OBESITY 473 21.5
## 17 LAUGHLIN GA, 2008, J CLIN ENDOCR METAB 444 44.4
## 18 KHAW KT, 2007, CIRCULATION 440 40.0
## 19 BAUMGARTNER RN, 1999, MECH AGEING DEV 439 23.1
## 20 LIU PY, 2003, ENDOCR REV 432 28.8
## 21 KAPOOR D, 2006, EUR J ENDOCRINOL 426 35.5
## 22 MARIN P, 1992, INT J OBESITY 422 16.2
## 23 RHODEN EL, 2004, NEW ENGL J MED 421 30.1
## 24 YAGGI HK, 2006, DIABETES CARE 401 33.4
## 25 WU FCW, 2003, ENDOCR REV 400 26.7
##
##
## Most Productive Countries (of corresponding authors)
##
## Country Articles Freq SCP MCP MCP_Ratio
## 1 USA 1193 0.29147 980 213 0.1785
## 2 ITALY 314 0.07672 257 57 0.1815
## 3 UNITED KINGDOM 265 0.06474 195 70 0.2642
## 4 GERMANY 201 0.04911 137 64 0.3184
## 5 AUSTRALIA 182 0.04447 158 24 0.1319
## 6 CHINA 164 0.04007 133 31 0.1890
## 7 TURKEY 161 0.03934 154 7 0.0435
## 8 JAPAN 136 0.03323 120 16 0.1176
## 9 CANADA 108 0.02639 84 24 0.2222
## 10 SPAIN 106 0.02590 92 14 0.1321
## 11 BRAZIL 105 0.02565 94 11 0.1048
## 12 SWEDEN 103 0.02516 73 30 0.2913
## 13 NETHERLANDS 83 0.02028 68 15 0.1807
## 14 FRANCE 75 0.01832 62 13 0.1733
## 15 POLAND 70 0.01710 60 10 0.1429
## 16 GREECE 68 0.01661 58 10 0.1471
## 17 TAIWAN 67 0.01637 61 6 0.0896
## 18 FINLAND 59 0.01441 49 10 0.1695
## 19 DENMARK 52 0.01270 49 3 0.0577
## 20 KOREA 46 0.01124 41 5 0.1087
## 21 IRAN 38 0.00928 37 1 0.0263
## 22 NORWAY 36 0.00880 29 7 0.1944
## 23 BELGIUM 33 0.00806 23 10 0.3030
## 24 MEXICO 31 0.00757 26 5 0.1613
## 25 INDIA 29 0.00709 26 3 0.1034
##
##
## SCP: Single Country Publications
##
## MCP: Multiple Country Publications
##
##
## Total Citations per Country
##
## Country Total Citations Average Article Citations
## 1 USA 51866 43.48
## 2 UNITED KINGDOM 19970 75.36
## 3 ITALY 10271 32.71
## 4 GERMANY 7848 39.04
## 5 AUSTRALIA 6894 37.88
## 6 SWEDEN 5972 57.98
## 7 NETHERLANDS 4225 50.90
## 8 CANADA 3415 31.62
## 9 JAPAN 2830 20.81
## 10 TURKEY 2792 17.34
## 11 FINLAND 2654 44.98
## 12 FRANCE 2444 32.59
## 13 SPAIN 2336 22.04
## 14 CHINA 1968 12.00
## 15 GREECE 1672 24.59
## 16 DENMARK 1618 31.12
## 17 NORWAY 1543 42.86
## 18 BELGIUM 1449 43.91
## 19 POLAND 1392 19.89
## 20 BRAZIL 1286 12.25
## 21 ISRAEL 918 38.25
## 22 TAIWAN 868 12.96
## 23 AUSTRIA 587 24.46
## 24 MEXICO 577 18.61
## 25 SWITZERLAND 520 18.57
##
##
## Most Relevant Sources
##
## Sources Articles
## 1 JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM 201
## 2 CLINICAL ENDOCRINOLOGY 117
## 3 JOURNAL OF SEXUAL MEDICINE 117
## 4 EUROPEAN JOURNAL OF ENDOCRINOLOGY 83
## 5 JOURNAL OF ENDOCRINOLOGICAL INVESTIGATION 58
## 6 METABOLISM-CLINICAL AND EXPERIMENTAL 55
## 7 ATHEROSCLEROSIS 53
## 8 GYNECOLOGICAL ENDOCRINOLOGY 52
## 9 AGING MALE 45
## 10 ENDOCRINOLOGY 45
## 11 HYPERTENSION 45
## 12 PLOS ONE 41
## 13 FERTILITY AND STERILITY 40
## 14 AMERICAN JOURNAL OF PHYSIOLOGY-HEART AND CIRCULATORY PHYSIOLOGY 38
## 15 MATURITAS 38
## 16 INTERNATIONAL JOURNAL OF OBESITY 33
## 17 HORMONE AND METABOLIC RESEARCH 30
## 18 INTERNATIONAL JOURNAL OF IMPOTENCE RESEARCH 30
## 19 ASIAN JOURNAL OF ANDROLOGY 29
## 20 HUMAN REPRODUCTION 29
## 21 EXPERIMENTAL AND CLINICAL ENDOCRINOLOGY & DIABETES 27
## 22 JOURNAL OF ENDOCRINOLOGY 27
## 23 MENOPAUSE-THE JOURNAL OF THE NORTH AMERICAN MENOPAUSE SOCIETY 27
## 24 STEROIDS 27
## 25 AMERICAN JOURNAL OF PHYSIOLOGY-ENDOCRINOLOGY AND METABOLISM 26
##
##
## Most Relevant Keywords
##
## Author Keywords (DE) Articles Keywords-Plus (ID) Articles
## 1 TESTOSTERONE 975 CARDIOVASCULAR-DISEASE 827
## 2 ERECTILE DYSFUNCTION 221 TESTOSTERONE 744
## 3 ANDROGENS 210 INSULIN-RESISTANCE 555
## 4 METABOLIC SYNDROME 196 MEN 513
## 5 HYPOGONADISM 187 METABOLIC SYNDROME 471
## 6 INSULIN RESISTANCE 168 POSTMENOPAUSAL WOMEN 411
## 7 POLYCYSTIC OVARY SYNDROME 163 CORONARY-ARTERY-DISEASE 374
## 8 CARDIOVASCULAR DISEASE 153 MIDDLE-AGED MEN 334
## 9 HYPERTENSION 137 RISK-FACTORS 327
## 10 OBESITY 136 HORMONE-BINDING GLOBULIN 326
## 11 ESTROGEN 116 RISK 319
## 12 SEX HORMONES 114 OLDER MEN 318
## 13 ESTRADIOL 102 WOMEN 283
## 14 ATHEROSCLEROSIS 99 CORONARY-HEART-DISEASE 273
## 15 PROSTATE CANCER 93 ELDERLY-MEN 254
## 16 AGING 91 SEX-HORMONES 247
## 17 HORMONES 86 DISEASE 243
## 18 ANDROGEN 82 HEART-DISEASE 233
## 19 BLOOD PRESSURE 80 ENDOGENOUS SEX-HORMONES 232
## 20 DIABETES 75 MYOCARDIAL-INFARCTION 230
## 21 CARDIOVASCULAR RISK 74 BLOOD-PRESSURE 218
## 22 GENDER 71 REPLACEMENT THERAPY 203
## 23 MENOPAUSE 65 LOW SERUM TESTOSTERONE 202
## 24 LIPIDS 62 ASSOCIATION 200
## 25 PCOS 62 PREVALENCE 200
Please note the change in k between the CVD example and my analyses for all 25 domains.
setwd("C:/Users/soren/Google Drive/Biomedical MultipliciTs/1. Evidence Infrastructure/3. Reiterative Analysis")
aging_bib <- biblioAnalysis(aging, sep = ";")
aging_stats <- summary(object = aging_bib, k = 100, pause = FALSE)
write.table(as.data.frame(aging_stats$MostRelKeywords),file="aging_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(aging_stats$AnnualProduction),file="aging_output.csv", quote=F,sep=",",row.names=F)
bone_bib <- biblioAnalysis(bone, sep = ";")
bone_stats <- summary(object = bone_bib, k = 100, pause = FALSE)
write.table(as.data.frame(bone_stats$MostRelKeywords),file="bone_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(bone_stats$AnnualProduction),file="bone_output.csv", quote=F,sep=",",row.names=F)
breast_cancer_bib <- biblioAnalysis(breast_cancer, sep = ";")
breast_cancer_stats <- summary(object = breast_cancer_bib, k = 100, pause = FALSE)
write.table(as.data.frame(breast_cancer_stats$MostRelKeywords),file="breast_cancer_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(breast_cancer_stats$AnnualProduction),file="breast_cancer_output.csv", quote=F,sep=",",row.names=F)
cvd_bib <- biblioAnalysis(cvd, sep = ";")
cvd_stats <- summary(object = cvd_bib, k = 100, pause = FALSE)
write.table(as.data.frame(cvd_stats$MostRelKeywords),file="cvd_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(cvd_stats$AnnualProduction),file="cvd_output.csv", quote=F,sep=",",row.names=F)
derm_bib <- biblioAnalysis(derm, sep = ";")
derm_stats <- summary(object = derm_bib, k = 100, pause = FALSE)
write.table(as.data.frame(derm_stats$MostRelKeywords),file="derm_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(derm_stats$AnnualProduction),file="derm_output.csv", quote=F,sep=",",row.names=F)
dsd_bib <- biblioAnalysis(dsd, sep = ";")
dsd_stats <- summary(object = dsd_bib, k = 100, pause = FALSE)
write.table(as.data.frame(dsd_stats$MostRelKeywords),file="dsd_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(dsd_stats$AnnualProduction),file="dsd_output.csv", quote=F,sep=",",row.names=F)
edcs_bib <- biblioAnalysis(edcs, sep = ";")
edcs_stats <- summary(object = edcs_bib, k = 100, pause = FALSE)
write.table(as.data.frame(edcs_stats$MostRelKeywords),file="edcs_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(edcs_stats$AnnualProduction),file="edcs_output.csv", quote=F,sep=",",row.names=F)
firs_bib <- biblioAnalysis(firs, sep = ";")
firs_stats <- summary(object = firs_bib, k = 100, pause = FALSE)
write.table(as.data.frame(firs_stats$MostRelKeywords),file="firs_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(firs_stats$AnnualProduction),file="firs_output.csv", quote=F,sep=",",row.names=F)
immuno_bib <- biblioAnalysis(immuno, sep = ";")
immuno_stats <- summary(object = immuno_bib, k = 100, pause = FALSE)
write.table(as.data.frame(immuno_stats$MostRelKeywords),file="immuno_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(immuno_stats$AnnualProduction),file="immuno_output.csv", quote=F,sep=",",row.names=F)
interventions_bib <- biblioAnalysis(interventions, sep = ";")
interventions_stats <- summary(object = interventions_bib, k = 100, pause = FALSE)
write.table(as.data.frame(interventions_stats$MostRelKeywords),file="interventions_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(interventions_stats$AnnualProduction),file="interventions_output.csv", quote=F,sep=",",row.names=F)
metabolic_bib <- biblioAnalysis(metabolic, sep = ";")
metabolic_stats <- summary(object = metabolic_bib, k = 100, pause = FALSE)
write.table(as.data.frame(metabolic_stats$MostRelKeywords),file="metabolic_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(metabolic_stats$AnnualProduction),file="metabolic_output.csv", quote=F,sep=",",row.names=F)
methods_bib <- biblioAnalysis(methods, sep = ";")
methods_stats <- summary(object = methods_bib, k = 100, pause = FALSE)
write.table(as.data.frame(methods_stats$MostRelKeywords),file="methods_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(methods_stats$AnnualProduction),file="methods_output.csv", quote=F,sep=",",row.names=F)
muscle_bib <- biblioAnalysis(muscle, sep = ";")
muscle_stats <- summary(object = muscle_bib, k = 100, pause = FALSE)
write.table(as.data.frame(muscle_stats$MostRelKeywords),file="muscle_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(muscle_stats$AnnualProduction),file="muscle_output.csv", quote=F,sep=",",row.names=F)
neuro_bib <- biblioAnalysis(neuro, sep = ";")
neuro_stats <- summary(object = neuro_bib, k = 100, pause = FALSE)
write.table(as.data.frame(neuro_stats$MostRelKeywords),file="neuro_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(neuro_stats$AnnualProduction),file="neuro_output.csv", quote=F,sep=",",row.names=F)
obesity_bib <- biblioAnalysis(obesity, sep = ";")
obesity_stats <- summary(object = obesity_bib, k = 100, pause = FALSE)
write.table(as.data.frame(obesity_stats$MostRelKeywords),file="obesity_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(obesity_stats$AnnualProduction),file="obesity_output.csv", quote=F,sep=",",row.names=F)
pcos_bib <- biblioAnalysis(pcos, sep = ";")
pcos_stats <- summary(object = pcos_bib, k = 100, pause = FALSE)
write.table(as.data.frame(pcos_stats$MostRelKeywords),file="pcos_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(pcos_stats$AnnualProduction),file="pcos_output.csv", quote=F,sep=",",row.names=F)
prostate_cancer_bib <- biblioAnalysis(prostate_cancer, sep = ";")
prostate_cancer_stats <- summary(object = prostate_cancer_bib, k = 100, pause = FALSE)
write.table(as.data.frame(prostate_cancer_stats$MostRelKeywords),file="prostate_cancer_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(prostate_cancer_stats$AnnualProduction),file="prostate_cancer_output.csv", quote=F,sep=",",row.names=F)
puberty_bib <- biblioAnalysis(puberty, sep = ";")
puberty_stats <- summary(object = puberty_bib, k = 100, pause = FALSE)
write.table(as.data.frame(puberty_stats$MostRelKeywords),file="puberty_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(puberty_stats$AnnualProduction),file="puberty_output.csv", quote=F,sep=",",row.names=F)
#plot(x = puberty_bib, k = 10, pause = FALSE)
#options(max.print=1000000)
quantity_bib <- biblioAnalysis(quantity, sep = ";")
quantity_stats <- summary(object = quantity_bib, k = 100, pause = FALSE)
write.table(as.data.frame(quantity_stats$MostRelKeywords),file="quantity_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(quantity_stats$AnnualProduction),file="quantity_output.csv", quote=F,sep=",",row.names=F)
sexmed_bib <- biblioAnalysis(sexmed, sep = ";")
sexmed_stats <- summary(object = sexmed_bib, k = 100, pause = FALSE)
write.table(as.data.frame(sexmed_stats$MostRelKeywords),file="sexmed_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(sexmed_stats$AnnualProduction),file="sexmed_output.csv", quote=F,sep=",",row.names=F)
snet_bib <- biblioAnalysis(snet, sep = ";")
snet_stats <- summary(object = snet_bib, k = 100, pause = FALSE)
write.table(as.data.frame(snet_stats$MostRelKeywords),file="snet_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(snet_stats$AnnualProduction),file="snet_output.csv", quote=F,sep=",",row.names=F)
surgical_bib <- biblioAnalysis(surgical, sep = ";")
surgical_stats <- summary(object = surgical_bib, k = 100, pause = FALSE)
write.table(as.data.frame(surgical_stats$MostRelKeywords),file="surgical_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(surgical_stats$AnnualProduction),file="surgical_output.csv", quote=F,sep=",",row.names=F)
test_cancer_bib <- biblioAnalysis(test_cancer, sep = ";")
test_cancer_stats <- summary(object = test_cancer_bib, k = 100, pause = FALSE)
write.table(as.data.frame(test_cancer_stats$MostRelKeywords),file="testicular_cancer_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(test_cancer_stats$AnnualProduction),file="testicular_cancer_output.csv", quote=F,sep=",",row.names=F)
testo_therapies_bib <- biblioAnalysis(testo_therapies, sep = ";")
testo_therapies_stats <- summary(object = testo_therapies_bib, k = 100, pause = FALSE)
write.table(as.data.frame(testo_therapies_stats$MostRelKeywords),file="testo_therapies_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(testo_therapies_stats$AnnualProduction),file="testo_therapies_output.csv", quote=F,sep=",",row.names=F)
transhealth_bib <- biblioAnalysis(transhealth, sep = ";")
transhealth_stats <- summary(object = transhealth_bib, k = 100, pause = FALSE)
write.table(as.data.frame(transhealth_stats$MostRelKeywords),file="transhealth_keywords.csv", quote=F,sep=",",row.names=F)
write.table(as.data.frame(transhealth_stats$AnnualProduction),file="transhealth_output.csv", quote=F,sep=",",row.names=F)
# (I've hidden the extraction process to make this file more concise.)
After generating the output for each domain, I went through and included all of the relevant keywords in my sensitivity analyses to ensure that the search terms was capturing publications that belong in each domain. Here is a summary of the CVD coding process as an example.
In my summary page of all 25 domains, I have provided basic descriptives of the size each domain was after Step 2 (the “Original Total” column) and Step 3 (the “Final Totals” column). Overall, the average size of the 25 domains is about 5,000 articles (M = 4965.8, SD = 1840.3). Studies that examine disorder of testosterone therapies (n = 8,185), sex development and sex differences (n = 7,112), and aging (n = 6,498) are the largest domains while the trans health literature is by far the smallest of these domains at only 464 articles. For those interested, I have also included a summary of (1) how excluding animal research and (2) including all WoS databases affect the size of each domain.
Now that we have our 25 domains set up, we will apply the same code we just ran to generate information about the most prominent authors, most cited publications, and most commonly occurring keywords in each of the 25 domains. Here is a link to the bibliometrix analyses of each of the 25 domains:
Aging | Bone | Breast Cancer | CV Disease | Dermatology | Disorders of Sex Development | Endocrine Disrupting Chemicals | Fertility | Immunology | Interventions | Metabolic Diseases | Methods | Muscle | Neuro/Mental Health | Obesity | Polycystic Ovary Syndrome | Prostate Cancer | Puberty | Quantity | Sexual Medicine | Social Neuroendocrinology | Surgical | Testicular Cancer | Testosterone Therapies | Trans Health
To finish this exercise, let’s take a look at how each domain changes in size over time. There isn’t a lot of variability during the 1980’s, so let’s use ggplot2 and plotly to graph each domain from 1990-2016.
setwd("C:/Users/soren/Google Drive/Biomedical MultipliciTs/1. Evidence Infrastructure/4. Domain Analysis")
#install.packages('ggplot2')
#install.packages('plotly')
library(ggplot2)
library(plotly)
tsearch_growth <- read.csv("tsearch_growth.csv", stringsAsFactors = FALSE)
tsearch_growth_graph <- ggplot(tsearch_growth, aes(x=year)) +
geom_smooth(aes(y = aging, colour = "Aging"), se = FALSE) +
geom_smooth(aes(y = bone, colour = "Bone"), se = FALSE) +
geom_smooth(aes(y = breast_cancer, colour = "Breast Cancer"), se = FALSE) +
geom_smooth(aes(y = cvd, colour = "CV Disease"), se = FALSE) +
geom_smooth(aes(y = derm, colour = "Dermatology"), se = FALSE) +
geom_smooth(aes(y = dsd, colour = "Dis. Sex Dev."), se = FALSE) +
geom_smooth(aes(y = edcs, colour = "EDC's"), se = FALSE) +
geom_smooth(aes(y = fertility, colour = "Fertility"), se = FALSE) +
geom_smooth(aes(y = immunology, colour = "Immunology"), se = FALSE)+
geom_smooth(aes(y = interventions, colour = "Interventions"), se = FALSE) +
geom_smooth(aes(y = metabolic_disease, colour = "Met. Disease"), se = FALSE) +
geom_smooth(aes(y = methods, colour = "Methods"), se = FALSE) +
geom_smooth(aes(y = muscle, colour = "Muscle"), se = FALSE) +
geom_smooth(aes(y = neuro, colour = "Neuro"), se = FALSE) +
geom_smooth(aes(y = obesity, colour = "Obesity"), se = FALSE) +
geom_smooth(aes(y = pcos, colour = "PCOS"), method = lm, se = FALSE) +
geom_smooth(aes(y = prostate_cancer, colour = "Prostate Cancer"), se = FALSE) +
geom_smooth(aes(y = puberty, colour = "Puberty"), se = FALSE) +
geom_smooth(aes(y = quantity, colour = "Quantity"), se = FALSE) +
geom_smooth(aes(y = sexual_medicine, colour = "Sexual Medicine"), se = FALSE) +
geom_smooth(aes(y = social_neuro, colour = "Soc. Neuroendo."), se = FALSE) +
geom_smooth(aes(y = surgical, colour = "Surgical"), se = FALSE) +
geom_smooth(aes(y = testicular_cancer, colour = "Testic. Cancer"), se = FALSE) +
geom_smooth(aes(y = testo_therapies, colour = "Testo Therapies"), se = FALSE) +
geom_smooth(aes(y = trans_health, colour = "Trans Health"), se = FALSE) +
ggtitle("Growth of Testosterone Research from 1990-2016") + labs(x="Year", y="Publication Total") +
theme(legend.title=element_blank()) + theme(legend.text=element_text(size = 8)) +
theme(legend.key=element_rect(fill='white')) + theme(panel.background=element_rect(fill = 'grey93')) +
scale_x_continuous(limits=c(1990, 2016), breaks=seq(1990, 2016, 5)) + scale_y_continuous(breaks=seq(0, 500, 50))
colors <- tsearch_growth_graph + scale_color_manual(values=c(
"#330000", "#660000", "#990000", "#CC3300", "#993300",
"#FF9900", "#FFCC00", "#66CC33", "#339900", "#336600",
"#003333", "#0066CC", "#3366CC", "#3399CC", "#6633CC",
"#660033", "#990066", "#FF99FF", "#9966FF", "#99CCFF",
"#6699CC", "#33CC66", "#CCFF33", "#FFCC00", "#CC6666"))
(gg <- ggplotly(colors))
Overall, we see marked variability in the size of these domains over time. Overall, there is noticeable growth across 20 of these domains from 1990-2016. On the other hand, three of these domains (i.e. dermatology, testicular cancer, and trans health) show much more marginal growth. Lastly, the domains of breast cancer and surgical studies show growth only in the early portion of this window before eventually declining after 2010 and 1996 respectively.
Aria, M. & Cuccurullo, C. (2017) “bibliometrix: An R-tool for comprehensive science mapping analysis.” Journal of Informetrics, 11(4), 959-975.
Fine, C. (2017). Testosterone Rex: Myths of Sex, Science, and Society. WW Norton & Company.
Jordan-Young, R. & Karkazis, K. Testosterone: The Unauthorized Biography. Harvard University Press.
Knorr-Cetina, K. (1999). Epistemic Cultures: How the Sciences Make Knowledge. Harvard University Press.
Knorr-Cetina, K. (2007). “Culture in Global Knowledge Societies: Knowledge Cultures and Epistemic Cultures.” Interdisciplinary Science Reviews, 32(4), 361-375.
Mol, A. (2002). The Body Multiple: Ontology in Medical Practice. Duke University Press.
Oudshoorn, N. (2003). Beyond the Natural Body: An Archaeology of Sex Hormones. Routledge.